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Abstract — We present a joint message passing approacti tliat 
combines belief propagation and the mean field approxima- 
tion. Our analysis is based on the region-based free energy 
approximation method proposed by Yedidia et al. We show that 
the message passing fixed-point equations obtained with this 
combination correspond to stationary points of a constrained 
region-based free energy approximation. Moreover, we present 
a convergent implementation of these message passing fixed- 
point equations provided that the underlying factor graph fulfills 
certain technical conditions. In addition, we show how to include 
hard constraints in the part of the factor graph corresponding 
to belief propagation. Finally, we demonstrate an application of 
our method to iterative channel estimation and decoding in an 
orthogonal frequency division multiplexing (OFDM) system. 

Index Terms — Message passing, belief propagation, iterative 
algorithms, iterative decoding, parameter estimation 



I. Introduction 

Variational techniques have been used for decades in quan- 
tum and statistical physics, where they are referred to as the 
mean field (MF) approximation |l2l|. Later, they found their 
way to the area of machine learning or statistical inference, 
see, e.g., |l3l-||6|. The basic idea of variational inference is 
to derive the statistics of "hidden" random variables given 
the knowledge of "visible" random variables of a certain 
probabihty density function (pdf). In the MF approximation, 
this pdf is approximated by some "simpler," e.g., (fully) 
factorized pdf and the Kullback-Leibler divergence between 
the approximating and the true pdf is minimized, which can 
be done in an iterative, i.e., message passing like way. Apart 
from being fully factorized, the approximating pdf typically 
fulfills additional constraints that allow for messages with a 
simple structure, which can be updated in a simple way. For 
example, additional exponential conjugacy constraints result 
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in messages propagating along the edges of the underlying 
Bayesian network that are described by a small number of 
parameters ||5|. Variational inference methods were recently 
applied in Q to the channel state estimation/interference 
cancellation part of a class of MIMO-OFDM receivers that 
iterate between detection, channel estimation, and decoding. 

An approach different from the MF approximation is be- 
lief propagation (BP) IS). Roughly speaking, with BP one 
tries to find local approximations, which are — exactly or 
approximately — the marginals of a certain pdfl This can also 
be done in an iterative way, where messages are passed along 
the edges of a factor graph | fTO) . A typical application of BP is 
decoding of turbo or low density parity check (LDPC) codes. 
Based on the excellent performance of BP, a lot of variations 
have been derived in order to improve the performance of 
this algorithm even further For example, minimizing an upper 
bound on the log partition function of a pdf leads to the 
powerful tree reweighted BP algorithm ifTTI . An offspring of 
this idea is the recently developed uniformly tree reweighted 
BP algorithm lfT2l . Another example is |[T3l . where methods 
from information geometry are used to compute correction 
terms for the beliefs obtained by loopy BP. An alternative 
approach for turbo decoding that uses projections (that are 
dual in the sense of lfT4l Ch. 3] to the one used in ITSl ) on 
constraint subsets can be found in ifTSl . A combination of the 
approaches used in ifTSl and in ifTSl can be found in 1161 . 

Both methods, BP and the MF approximation, have their 
own virtues and disadvantages. For example, the MF approx- 
imation 

-I- always admits a convergent implementation; 
-I- has simple message passing update rules, in particular 
for conjugate-exponential models; 

- is not compatible with hard constraints, 
and BP 

-I- yields a good approximation of the marginal 

distributions if the factor graph has no short cycles; 

-I- is compatible with hard constraints like, e.g., 
code constraints; 

- may have a high complexity, especially when applied 
to probabilistic models involving both, discrete and 
continuous random variables. 

Hence, it is of great benefit to apply BP and the MF approx- 
imation on the same factor graph in such a combination that 
their respective virtues can be exploited while circumventing 
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their drawbacks. To this end, a unified message passing algo- 
rithm is needed that allows for combining both approaches. 

The fixed-point equations of both BP and the MF approxi- 
mation can be obtained by minimizing an approximation of the 
Kullback-Leibler divergence, called region-based free energy 
approximation. This approach differs from other methods, see, 
e.g., iflTlPi because the starting point for the derivation of the 
corresponding message passing fixed-point equations is the 
same objective function for both, BP and the MF approxi- 
mation. The main technical result of our work is Theorem |2] 
where we show that the message passing fixed-point equations 
for such a combination of BP and the MF approximation 
correspond to stationary points of one single constrained 
region-based free energy approximation and provide a clear 
rule stating how to couple the messages propagating in the BP 
and MF part. In fact, based on the factor graph corresponding 
to a factorization of a probability mass function (pmf) and 
a choice for a separation of this factorization into BP and 
MF factors, Theorem |2] gives the message passing fixed-point 
equations for the factor graph representing the whole factoriza- 
tion of the pmf. One example of an application of Theorem 
|2] is joint channel estimation, interference cancellation, and 
decoding. Typically, these tasks are considered separately and 
the coupling between them is described in a heuristic way. As 
an example of this problematic, there has been a debate in 
the research community on whether a posteriori probabilities 
(APP) or extrinsic values should be fed back from the decoder 
to the rest of the receiver components; several authors coincide 
in proposing the use of extrinsic values for MIMO detection 
|[T8l - ll20l while using APP values for channel estimation |fT9l . 
Il20l . but no thorough justification for this choice is given apart 
from the achieved superior performance shown by simulation 
results. Despite having a clear rule to update the messages for 
the whole factor graph representing a factorization of a pmf, 
an additional advantage is the fact that solutions of fixed-point 
equations for the messages are related to the stationary points 
of the corresponding constrained region-based free energy 
approximation. This correspondence is important because it 
yields an interpretation of the computed beliefs for arbitrary 
factor graphs similar to the case of solely BP, where solutions 
of the message passing fixed-point equations do in general not 
correspond to the true marginals if the factor graph has cycles 
but always correspond to stationary points of the constrained 
Bethe free energy ||9|- Moreover, this observation allows us to 
present a systematic way of updating the messages, namely. 
Algorithm [T] that is guaranteed to converge provided that the 
factor graph representing the factorization of the pmf fulfills 
certain technical conditions. 

The paper is organized as follows. In the remainder of 
this section we fix our notation. Section |ll] is devoted to the 
introduction of the region-based free energy approximations 
proposed by |j9| and to recall how BP, the MF approximation, 
and the EM algorithm fT\\ can be obtained by this method. 
Since the MF approximation is typically used for parameter 
estimation, we briefly show how to extend it to the case 

^ An information geometric interpretation of the different objective func- 
tions used in 1171 can be found in 1141 Ch. 2]. 



of continuous random variables using an approach presented 
already in 1221 pp. 36-38] that avoids complicated methods 
from variational calculus. Section |lll] is the main part of this 
work. There we state our main result, namely. Theorem |2l 
and show how the message passing fixed-point equations of a 
combination of BP and the MF approximation can be related to 
the stationary points of the corresponding constrained region- 
based free energy approximation. We then (i) prove Lemma |2l 
which generalizes Theorem |2] to the case where the factors of 
the pmf in the BP part are no longer restricted to be strictly 
positive real-valued functions, and (ii) present Algorithm [T] 
that is a convergent implementation of the message passing 
update equations presented in Theorem |2] provided that the 
factor graph representing the factorization of the pmf fulfills 
certain technical conditions. As a byproduct, (i) gives insights 
into solely BP (which is a special case of the combination of 
BP and the MF approximation) with hard constraints, where 
only conjectures are formulated in |9]. In Section HVl we apply 
Algorithm [T] to joint channel estimation and decoding in an 
OFDM system. More advanced receiver architectures together 
with numerical simulations and a comparison with other state 
of the art receivers can be found in 1231 and an additional 
application of the algorithm in a cooperative communications 
scenario is presented in 1241 . Finally, we conclude in Section 
W\ and present an outlook for further research directions. 

A. Notation 

Capital calligraphic letters A,X,Af denote finite sets. The 
cardinality of a set I is denoted by \X\. If i G I we write 
X\i for I \ {i}. We use the convention that n0(- • ■ ) — 1' 
where denotes the empty set. For any finite set I, /j 
denotes the indicator function on I, i.e., Ii(«) = 1 if i £ I 
and Ii(«) = else. We denote by capital letters X discrete 
random variables with a finite number of realizations and 
pmf px- For a random variable X, we use the convention 
that a; is a representative for all possible realizations of X, 
i.e., X serves as a running variable, and denote a particular 
realization by x. For example, • ■ ) ^'^^ through all 

possible realizations x of X and for two functions / and g 
depending on all realizations x of X, f{x) = g{x) means 
that f{x) = g{x) for each particular realization x of X. 
If F is a functional of a pmf px of a random variable X 
and g is a function depending on all realizations x of X, 
then = g{x) means that g^py = g{x) is well defined 

and holds for each particular realization x of X. We write 
X = {xi I i G I)^ for the realizations of the vector of 
random variables X = [Xi \ i G I)'^- If i G X, then 
J2x\x i - ■ ■) through all possible realizations of X but 
Xi. For any nonnegative real valued function / with argument 
X = (xi I i S X)"^ and i £ I, / denotes / with fixed 
argument Xi = x^. If a function / is identically zero, we 
write / = and / ^ means that it is not identically 
zero. For two real valued functions / and g with the same 
domain and argument x, we write f{x) cx g{x) if f ~ eg for 
some real positive constant c G M+. We use the convention 
that Oln(O) = 0, aln(f) = oo if a > 0, and 01n(§) = 
l25l p. 31]. For a; G M, 5{x) = 1 if .x = and zero else. 
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Matrices are denoted by capital boldface Greek letters. The 
superscripts ^ and ^ stand for transposition and Hermitian 
transposition, respectively. For a matrix A G C™^", the entry 
in the ith row and jth column is denoted by Xij = [AJ^.j. 
For two vectors x = {xi \ i € X)^ and y = (j/i | i £ l-)^ ^ 
X y = [xiyi I i G T)^ denotes the Hadamard product of 
X and y. Finally, CN(x; /x, S) stands for the pdf of a jointly 
proper complex Gaussian random vector X ~ CAf{fi, S) with 
mean and covariance matrix S. 

II. Known results 
A. Region-based free energy approximations ^9^ 

Let px be a certain positive pmf of a vector X of random 
variables Xi [i G T) that factorizes as 

PX(X) = n /a(Xa) (1) 

where x = (x^ | i G X)"^ and Xa = (a;^ | i G 7V(a))'^ with 
A/'(a) C X for all a G ^. Without loss of generality we assume 
that Ar\X = 0, which can always be achieved by renaming 
indicesll Since px is a strictly positive pmf, we can assume 
without loss of generality that all the factors fa of px in O 
are real-valued positive functions. Later in Section|IIIl we shall 
show how to relax the positivity constraint for some of these 
factors. The factorization in ([T]i can be visualized in a factor 
graph llOiFi In a factor graph, N{a) is the set of all variable 
nodes connected to a factor node a ^ A and N{i) represents 
the set of all factor nodes connected to a variable node i G X. 
An example of a factor graph is depicted in Figure [T] 

A region R = {Xr, Ar) consists of subsets of indices Xr C 
X and Ar C A with the restriction that a <E Ar, implies that 
Af{a) C Xjf. To each region R we associate a counting number 
cr G Z. a set 7e ^ {(i?,Cfl)} of regions and associated 
counting numbers is called valid if 

(R,CR)en (R,cn)en 

for all a G i G X. 

For a positive function h approximating px, we define the 

' For example, we can write 

I={1,2,...,|J|} 
A = {1,%...M\}- 

This implies that any function that is defined pointwise on A and X is well 
defined. For example, if in addition to the definition of the sets Af(a) (a G A) 
we set Af{i) = {a ^ A \ i G Af{a)} for all i £ I, the function 

M -.XuA-^UiluA) 

a H-j. Af{a), for all a £ A 
i^N{i), forallieX 

with n(IU^) denoting the collection of all subsets of lU^ is well defined 
because i ^ a for all i S X, a g A. 

^Throughout the paper we work with Tanner factor graphs as opposed to 
Forney factor graphs. 



variational free energy ||9]|f| 

^ ^ ' Px(x) 
= X &(x) hi 6(x) -^h{-K) hi px (x) . (2) 



= -H{b) =-U{b) 

In ©, H{b) denotes the entropy |l25l p. 5] of b and U{b) is 
called average energy of b. Note that F{b) is the Kullback- 
Leibler divergence 1251 p. 19] between h and px, i-e-, F{b) = 
D{b II px)- For a set TZ of regions and associated counting 
numbers, the region-based free energy approximation is de- 
fined as i^Tj = J7tc - Hn with 

Hn = - X Cij X &i?(x/?) In &i^(xi^). 

(R,CR)^n XH 

Here, each fe^j is defined locally on a region i?. Instead of 
minimizing F with respect to b, we minimize with respect 
to all hji {{R,cii) G TZ), where the bjf have to fulfill certain 
constraints. The quantities bji are called beliefs. We give two 
examples of valid sets of regions and associated counting 
numbers. 

Example 2.1: The trivial example 7?.mf — It 
leads to the MF fixed-point equations, as will be shown in 
Subsection III-CI 

Example 2.2: We define two types of regions: 

1) large regions: Ra == {Af{a), {a}), with cr^ — 1 for all 
a<E A; 

2) small regions: Ri = ({i},0), with cr- |A/'(i)| for 
all i G X. 

Note that this definition is well defined due to our assumption 
that AnX = 0. The region-based free energy approximation 
corresponding to the valid set of regions and associated 
counting numbers 

Hep = {(i?oCflJ \i€X}U {{Ra,CRj \ a € A} 

is called the Bethe free energy ||9l, ll26l . It leads to the BP 
fixed-point equations, as will be shown in Subsection III-BI 
The Bethe free energy is equal to the variational free energy 
when the factor graph has no cycles Jg). 

B. BP fixed-point equations 

The fixed-point equations for BP can be obtained from the 
Bethe free energy by imposing additional marginalization and 
normalization constraints and computing the stationary points 
of the corresponding Lagrangian function ||9l, ll27l . The Bethe 
free energy reads 

FBP^EEMxJlnM^ 

aeA x„ J"^^"-) 

'if px is not normalized to one, the definition of the variational free energy 
contains an additional normalization constant, called Helmholtz free energy 
E PP- 4-5]. 
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with ba — bji^ for all a G A, bi ^ bji. for all i G I, and 
^BP = ^Kbp- The normalization constraints for the beliefs ba 
(a S A) and the marginalization constraints for the beliefs ba 
and bi {a G A,i <E A/'(a)) can be included in the Lagrangian 
Sec. 3.1.3] 

a£AieJ\f{a) Xi Xa\xi 



Lrv — Fr 



aeA 



la 



'&a(Xa) - 1 



(4) 



The stationary points of the Lagrangian in (|4]l are then related 
to the BP fixed-point equations by the following theorem. 

Theorem 1: 19] Th. 2] Stationary points of the Lagrangian 
in must be BP fixed-points with positive behefs fulfilling 



Za/a(Xa) J]^ 

ieJ\f(a) 

ma^i{Xi) 

aeN(i) 



¥a{xi), for all a e ^ 



for alH £ X 



with 



E 



faip^a 



ceAf{i)\a 



) n 

jeAfia)\i 
(Xz) 



ia {Xj ) 



(5) 



(6) 



for all a e ^, i G M{a) and vice versa. Here, Za {a G A) are 
positive constants that ensure that the beliefs ba {a G A) are 
normalized to one. 

Often, the following alternative system of fixed-point equa- 
tions is solved instead of (|6j. 



Xa\xi jeN'{a)\i 

^a{Xi) = fhc-^iiXi) 
ceJ\f(i)\a 



>a {Xj 



(7) 



for all a G i G Af{a), where uja.i {a G A,i £ A/'(a)) are 
arbitrary positive constants. The reason for this is that for a 
fixed scheduling the messages computed in (|6]l differ from the 
messages computed in d?) only by positive constants, which 
drop out when the beliefs are normalized. See also Eq. 
(68) and Eq. (69)], where the " oc " symbol is used in the 
update equations indicating that the normalization constants 
are irrelevant. A solution of Q can be obtained, e.g., by 
updating corresponding likelihood ratios of the messages in (|6]l 
or by updating the messages according to (|6]l but ignoring the 
normalization constants Za {a G A). The algorithm converges 
if the normalized beliefs do not change any more. Therefore, 
a rescaling of the messages is irrelevant and a solution of (|7j 
is obtained. However, we note that a rescaled solution of (I?) 
is not necessarily a solution of (|6]l. Hence, the behefs obtained 
by solving (|7|i need not be stationary points of the Lagrangian 
in dU. To the best of our knowledge, this elementary insight 
is not published yet in the literature and we state a necessary 
and sufficient condition when a solution of O can be rescaled 
to a solution of (|6]l in the following lemma. 



Lemma 1: Suppose that {■ma^i{xi),ni^aixi)} (a G 
A,i E A/'(a)) is a solution of (IT) and set 



" E/a(Xa) n 



, for all aeA. (8) 



Then this solution can be rescaled to a solution of (|6]l if and 
only if there exist positive constants {i G I) such that 



QiZa, for all a G .4, i G Af{a). 



(9) 



Proof: See Appendix lAl ■ 
Remark 2.1: Note that for factor graphs that have a tree- 
structure the messages obtained by running the forward- 
backward algorithm IfTOl always fulfill (|9]l because we have 
Wa,i = 1 (a G ^, i G A/'(a)) and Za = I {a £ A) in this case. 

C. Fixed-point equations for the MF approximation 

A message passing interpretation of the MF approximation 
was derived in IS), ||29| . In this section, we briefly show how 
the corresponding fixed-point equations can be obtained by the 
free energy approach. To this end, we use T^^mf from Example 
12.11 together with the factorization constrainj^ 



(10) 



Plugging ( [Tol l into the expression for the region-based free en- 
ergy approximation corresponding to the trivial approximation 
7^MF we get 

i^MF = ^^6i(2;i)ln&i(a;i)-^ ^ n bi{xi)hvfa{yia) 

iGl Xi aeA x„ ieJ\f{a) 

(11) 

with Fmf — PjImf- Assuming that all the beliefs bi [i G X) 
have to fulfill a normalization constraint, the stationary points 
of the corresponding Lagrangian for the MF approximation 
can easily be evaluated to be 

= Ziexpl E E 11 bj{x.j)hi fa{y.a)\ 

\ aeM'{i)xa\xi jeAf{a)\i / 

(12) 

for all i G X, where the positive constants Zi [i G X) are such 
that bi is normalized to one for all i G 

For the MF approximation there always exists a convergent 
algorithm that computes beliefs bi [i G I) solving ( fT2] i by 
simply using (fT2] i as an iterative update equation for the 
beliefs. Since for all i G X 



MF 



1 



dbi{xiY bt{xi) 



> 



and the set of all beliefs bi satisfying the normalization 
constraint J^x bi{xi) = 1 is a convex set, the objective 
function Fmf in ( fTTT i cannot increase and the algorithm is 

^For binary random variables with pmf in an exponential family it was 
shown in (301 that this gives a good approximation whenever the truncation 
of the Plefka expansion does not introduce a significant error 

^ The Lagrange multiplier 1281 p. 283] for each belief bi (i 6 X) 
corresponding to the normaUzation constraint can be absorbed into the positive 
constant Zi {i £ X). 
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guaranteed to converge. Note that in order to derive a particular 
update bi (i e I) we need all previous updates bj with 

By setting ni^a{xi) = bi{xi) for all i G X, a G N{i), the 
fixed-point equations in (fT2] i are transformed into the message 
passing fixed-point equations 



j(.Ti) = exp 



(13) 

for all a G A.,i G M(a). The MF approximation can be 
extended to the case where px is a pdf, as shown in Appendix 
IbI Formally, each sum over (fc G 1) in ( fT2] i and (fTJt 
has to be replaced by a Lebesgue integral whenever the 
corresponding random variable is continuous. 



D. Expectation maximization (EM) 

Message passing interpretations for EM II2TI were derived in 
II3TI . II32I . It can be shown that EM is a special instance of the 
MF approximation 1331 Sec. 2.3.1], which can be summarized 
as follows. Suppose that we apply the MF approximation to 
px in (HJ as described before. In addition, we assume that 
for all i G f C I the beliefs bi fulfill the constraints that 
bi{xi) = 5{xi — Xi). Using the fact that Oln(O) — 0, we can 
rewrite Fmf in (fTTT i as 



^MF = ^ y^bi{xi)\n bi(xi) 

-EE n &.(^.)ln/a(x.). 

aeA x„ ieJ\f{a) 



(14) 



For alH G X \ £ the stationary points of Fmf in (fT4l i have the 
same analytical expression as the one obtained in (fTSI i. For 
i E £, minimizing i^MF in (O with respect to Xi yields 



n bj{Xj)hlfa{y^a) 



Xi = argmin(FMF) 

= argmax | exp [ 

Setting ni^a[xi) = bi{xi) for alH G I, a G M{i), we get the 
message passing update equations defined in ( fT3] l except that 
we have to replace the messages ni^a{xi) for alH G f and 

a G by 



with 



^i^a{Xi) — ^{Xi Xj) 



ii= argmax I ma-^i{xi 

\aGAA(i) 



III. Combined BP / MF approximation fixed-point 

EQUATIONS 

Let 

px(x)= n /,(xa) n /6(xb) (15) 

be a partially factorized pmf with ^mf H ^bp = and A = 
AuF U .4bp. As before, we have x = (2;^ | i G I), Xq = (x^ | 
i G AA(a))T, with A/'(a) C I for all a G A and = 
{a G ^ I i G N{a)} for all i G I. We refer to the factor 
graph representing the factorization Jlae^BP ^°-^^"-) ™ ( fTSl l as 
"BP part" and to the factor graph representing the factorization 
riaeytMF fai^a) in ( fTsT i as "MF part". Furthermore, we set 

Imf= U M{a), Xbp= U AA(a) 

and 

^fMF{^) = ^MF n A/-(i), AABp(i) ^ ^BP n A/-(z). 

Next, we define the following regions and counting num- 
bers: 

1) one MF region Rmf = (Imf,^mf), with cr„^ = 1; 

2) small regions Ri = {{i}, 0), with cj^. = 1 — |A/Bp(j)| — 

Iimf(«) for all i G Xbp; 

3) large regions Ra = (A/'(a), {a}), with cb„ = 1 for all 
a G ^BP- 

This yields the valid set of regions and associated counting 
numbers 

^BPMF = {{Ri, Cr^) lie Ibp} U {{Ra, Cr^) | O G Abp} 

U {{Rmf, CRM,)}- (16) 

The additional terms IiMp(j) in the counting numbers of 
the small regions Ri {i G I) defined in 2) compared to 
the counting numbers of the small regions for the Bethe 
approximation (see Example I2.2l i guarantee that 7?.bp,mf is 
indeed a valid set of regions and associated counting numbers. 

The valid set of regions and associated counting numbers 
in ( fTSl l gives the region-based free energy approximation 

BP,MF= E^'^(^'')^"T7~4 
ae>tBP x„ J''^^''' 

- E E n ^.(2^0 In /a(Xa) 
aeAuF Xa ieAf{a) 

]{\^^Bp{^)\-l)Y.b,{x,)\nb,{x,) (17) 



E( 



for alH G f ,a G M{a). 



with i^BP.MF = FtZbp.up- C2)' ^^^^ already plugged in 
the factorization constraint 

&Mf(xmf) = Y\. ^ii^i) 

with xmf = (xj I i G Imf)"^ and &mf = 6_r„f- The beUefs 6, 
(i G I) and 6a (a G -4bp) have to fulfill the normalization 
constraints 

= 1, for all i G Imf \ Irp 

J; (18) 

} ^ ba{xa) = 1, for all a G ^bp 
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and the marginalization constraints 

bi{x^) = ^ bai^a), for all a e ^bp, i e M{a). (19) 

Xa \xi 



Remark 3.1: Note that there is no need to introduce normal- 
ization constraints for the beliefs hi {i G 2bp)- If a G A/Bp(j), 
then it follows from the normalization constraint for the belief 
ba and marginalization constraint for the beliefs ba and bi that 

= E(EMx.)) 

Xi 

We will show in Lemma |2] that the region-based free energy 
approximation in ( fTTI i fulfilling the constraints ( fTSl l and (fT9] l 
is a finite quantity, i.e., that —oo < Fbp,mf < oo- 

The constraints (fTST i and ( fT9] l can be included in the 
Lagrangian ||28] Sec. 3.1.3] 



with 

ceA^Bp(i)\a ceA/'MF(i) 

for all a e ^, i e 7V(a) 

for all a G ^bp, * G A/'(a) 
m^Z;, (a;^ ) = cxp I ^ ]J n^^a (a^j ) In fa (xa ) j , 

\ XaV^i jeA/'(a)\i / 

for all a G ^mf, * G A/'(a) 

(22) 

and vice versa. Here, Zi [i G X) and Zq (a G ^bp) are positive 
constants that ensure that the beliefs bi [i G X) and ba {a G ^) 
are normalized to one with Zi = 1 for all i G Xbp- 

Proof: See Appendix ICl ■ 

Remark 3.2: Note that for each k G I\Xbp Theorem |2] can 
be generalized to the case where Xk is a continuous random 
variable following the derivation presented in Appendix IbI 
Formally, each sum over Xk with k G X\Xbp in the third iden- 
tity in (|22] | has to be replaced by a Lebesgue integral whenever 
the corresponding random variable Xk is continuous. 

Remark 3.3: Note that Theorem |2] clearly states whether 
"extrinsic" values or "APPs" should be passed. In fact, the 
first equation in (l22l i implies that each message ni^a{xi) {a G 
A,i E X) is an "extrinsic" value when a G ^bp and an "APP" 
when a G -4mf- 



ifipMP ~ -Fbp.mf 

- E 7.(E^^(^o-i) 

- E la(Y.ba{^a)-l). (20) 



aG-4BP Xa 



The stationary points of the Lagrangian Lbpmf in (I20b are 
then obtained by setting the derivatives of Lbpmf with respect 
to the beliefs and the Lagrange multipliers equal to zero. 
The following theorem relates the stationary points of the 
Lagrangian Lbpmf to solutions of fixed-point equations for 
the beliefs. 



Theorem 2: Stationary points of the Lagrangian in (|20] | in 
the combined BP-MF approach must be fixed-points with 
positive beliefs fulfilling 



bai^ia) ^ Za fai^a) ni^a{Xi), 

for all a G ^bp 

h{x..)=z. n ™a^.(^o n "^^'.(^0, 



(21) 



for alH G X 



aGA^MF(i) 



A. Hard constraints for BP 

Some suggestions on how to generalize Theorem [T| (||9] 
Th. 2]) to hard constraints, i.e., to the case where the factors 
of the pmf px are not restricted to be strictly positive real- 
valued functions, can be found in ||9] Sec. VI. D]. An example 
of hard constraints are deterministic functions like, e.g., code 
constraints. However, the statements formulated there are only 
conjectures and are based on the assumption that we can 
always compute the derivative of the Lagrange function with 
respect to the beliefs. This is not always possible because 

5Fbp 

— — oo, as /a(Xa) 

Oba[yia) 

with Fbp from (|3]l. In the sequel, we show how to generalize 
Theorem 12] to the case where /a > for all a G ^bp based 
on the simple observation that we are interested in solutions 
where the region-based free energy approximation is not plus 
infinity (recall that we want to minimize this quantity). As 
a byproduct, this also yields an extension of Theorem [T| ( ||9] 
Th. 2]) to hard constraints by simply setting ^mf = 0- 
Lemma 2: Suppose that 

fa > 0, for all a G ^bp (23) 
fa>0, for all a G ^mf (24) 

and px \xi^ for all i G X and each realization Xi of Xi0 
Furthermore, we assume that bi {i G X) and ba (a G -4bp) 
fulfill the constraints (HHl and Then 



; then we can simply remove this realization Xi of Xi. 
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1) ^BP,MF > -oo; 

2) The condition 

&a(xa) = 0, for all Xa with a e Abp, /a(Xa) = 

(25) 

is necessary and sufficient for i^BP.MF < oo; 

3) If jZST l is fulfilled, the remaining stationary points bi{xi) 
{i £1) and 6a (xq ) excluding all x^ from ( |25l ) (a G ^bp) 
of the Lagrangian in ( |20l i are positive beliefs fulfilling 
( I2TI 1 and ( |22] | excluding all Xq from dZST l and vice versa. 

4) Moreover, ( I2TI 1 and ( |22] | hold for all realizations x^ 
(including all Xa from dZST l) and, therefore, (|2T]) contains 
dZSl l as a special case. 

Proof: See Appendix iDl ■ 
Remark 3.4: At first sight it seems to be a contradiction to 
the marginalization constraints il9[ that dZSl ) holds and all the 
beliefs bi (i G Ibp) are strictly positive functions. To illustrate 
that this is indeed the case, let i G Xbp, a G A/bp(*)' and 
fix one realization Xi of X;. Since px \xi^ we also have 
fa \xi^ 0. This implies that fai^a) for at least one 
realization x^ = {xj \ j G N'{a))'^ with i G 7V(a) and, 
therefore, 6a (xq) 7^ 0. The marginalization constraints (fT9] l 
together with the fact that the belief must be a nonnegative 
function then implies that we have indeed bi{xi) > 0. 

B. Convergence and main algorithm 
If the BP part has no cycle and 

|7V(a) HIbpI < 1, for all a G ^mf (26) 

then there exists a convergent implementation of the combined 
message passing equations in ( |22] |. In fact, we can iterate 
between updating the beliefs bi with i G 2mf \ ^bp and the 
forward backward algorithm in the BP part, as outlined in the 
following Algorithm. 

Algorithm 1: If the BP part has no cycle and ( |26] l is 
fulfilled, the following implementation of the fixed-point equa- 
tions in (l22T i is guaranteed to converge. 

1) Initialize bi for all i G Imf \ 2bp and send the cor- 
responding messages ni^a{xi) ~ bi{xi) to all factor 
nodes a G 7\/mf(«)- 

2) Use all messages m^^i{xi) with i G 2bp H 2mf and 
a G A/mf(*) as fixed input for the BP part and run 
the forward/backward algorithm | fTO| . The fact that the 
resulting beliefs b^ with i G 2bp cannot increase the 
region-based free energy approximation in ([TtI i is proved 
in Appendix |E] 

3) For each i G Imf H 2bp and a G A/mf(*) the message 
ni^a{xi) is now available and can be used for further 
updates in the MF part. 

4) For each i G Imf \ 2bp successively recompute the 
message ni^a{xi) and send it to all a G A/mf(*)- Note 
that for all indices i G Xmf \ 2bp 

<9^Fbpmf _ 1 ^ Q 
dbi{xi)'^ bi{xi) 

and the set of all beUefs bi satisfying the normahzation 
constraint (first equation in ( fTSl )) is a convex set. This 



impUes that for each i G Imf \ ^bp we are solving 
a convex optimization problem. Therefore, the region- 
based free energy approximation in ( fTTl ) cannot increase. 
5) Proceed as described in 2). 

Remark 3.5: If the factor graph representing the BP part is 
not cycle-free then Algorithm [T] can be modified by running 
loopy BP in step 2). However, in this case the algorithm is 
not guaranteed to converge. 

IV. Application to iterative channel estimation 

AND DECODING 

In this section, we present an example where we show 
how to compute the updates of the messages in ( l22b based 
on Algorithm [T] We choose a simple communication model 
where the updates of the messages are simple enough in 
order to avoid overstressed notation. A class of more complex 
MIMO-OFDM receiver architectures together with numerical 
simulations can be found in 1231 . In our example, we use 
BP for modulation and decoding and the MF approximation 
for estimating the parameters of the a posteriori distribution 
of the channel gains. This splitting is convenient because BP 
works well with hard constraints and the MF approximation 
yields very simple message passing update equations due to 
the fact that the MF part in our example is a conjugate- 
exponential model |5j|. Applying BP to all factor nodes would 
be intractable because the complexity is too high, cf. the 
discussion in Subsection IIV-CI 

Specifically, we consider an OFDM system with M + N 
active subcarriers. We denote by P C [1 : A/ + N] and V C 
[1 : M + N] the sets of subcarrier indices for the data and 
pilot symbols, respectively with \V\ — M, \V\ ~ N, and 

vr\v = %. 

In the transmitter, a random vector U = {Ui \ i e[1 : K]) 
representing the information bits is encoded and interleaved 
using a rate R = K/ {LN) encoder and a random interleaver, 
respectively into the random vector 

c= (c«^,...,cw^)^ 

of length LN representing the coded and interleaved bits. Each 
random subvector C(") = {c["\ C'i"')'^ of length L is 
then mapped, i.e., modulated, to Xi^ G S with z„ ^ T> {n ^ 
[1 : N]), where 5 is a complex modulation alphabet of size 

After removing the cyclic prefix in the receiver, we get the 
following input-output relationship in the frequency domain: 

Yd = Hd Xd + Zd 
Yp = Hp xp + Zp 

where Xd = {Xi \ i G V)^ is the random vector correspond- 
ing to the transmitted data symbols, xp = {xi \ i G V)'^ 
is the vector containing the transmitted pilot symbols, and 
Hd = {Hi I i G D)T and Up ^ {Hi \ i € V)^ are 
random vectors representing the multiplicative action of the 
channel while Zd = (^^ | i G V)^ and Zp = (Z^ | i G V)^ 
are random vectors representing additive Gaussian noise with 
pz(z) = CN(z;0,7-ilM+Ar) and Z ^ (Z, | i G VUV)'^. 
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Note that ( [27] ) is very general and can also be used to model, 
e.g., a time-varying frequency-flat channel. 

Setting Y^{Y,\ie VUP)'^ and U ^ {H, \ i e VUP)^, 
the pdf Py.Xd.h.cu admits the factorization 

PY,XD,H,c,u(y, Xd, h, c, u) 

= ?'Y|XD,H(y|xD,h)pH(h)pxD|c(xD|c)pc|u(cIu) pu{u) 
= Y[pYi\XiMi{yt\Xt,hi) Y[pY,\Hjiyj\hj) XpH(h) 



X n ^^-^.„|c(" 

nG[l:N] 

X n PUkM 
kl£\l:K] 



[Xi C 



in)) 



Pc|u(c|u) 



(28) 



where we used the fact that H is independent of Xq, C, and 
U and Y is independent of C and U conditioned on Xq. 
Note that 

= CN{yi;hiXi,l/j), for alH e I? 

(29) 

PY,\HSyi\^i) = -exp(-7|yi - hiXi\^) 

IT 

= CN(yj; h,x^, I/7), for all i £ V. 

(30) 

We choose for the prior distribution of H 

PH(h)=CN(h;/xPj,APj"'). 

Now define 

I ^ {X, I ^ G P} U {H} 

ij{&^\...,&P}yj{u^,...,UK} (31) 

A = {py,\x,,h, \ ieV}yj {py,\h, \ ieV}yj {ph} 
U{Px.„|c(") \ ne[l:N]] 

U{pc|u}U{pc;Jfce[l:/^]} (32) 



and set fa — a for all a <E A. For example, we have fp^^ (h) = 
PH(h). We choose a splitting of A into ^bp and .4mf with 

Ap={Px,JC(") \ ne[l:N]} 

U{pc|u}U{pc/J fc e [1 : All (33) 
-4mf = {py\x.,h, M e U {py^ih, M e U {pu}- 
With this selection 

iBP={x.,\tev}u{c['\...,ci^^} 

U{Ui,...,Uk} 
2mf = {X, \ieV}U{U} 

which implies that IbpHImf = {Xi \ i G V}. The factor graph 
corresponding to the factorization in ( |28] | with the splitting of 
A into ^MF and ^bp as in (l33T l is depicted in Figure [T] 

We now show how to apply the variant of Algorithm [T] 
referred to in Remark 13.51 to the factor graph depicted in 
Figure [T] Note that ( |26] | is fulfilled in this example; however, 
cycles occur in the BP part of the factor graph due to the 
combination of (convolutional) coding, interleaving, and high- 
order modulation (see Table 



Algorithm 2: 
1) Initialize 



by setting 



with 



&H(h) =CN(h;/XH,AHi) 



flu = (Ah/^h + A-h/^h) 
Ah = Ah + Ah 



and 



A 



7 1 a-i p if i = j £V 
else 



■yyiX* if i eV 
if i G 2? 



and set 

nH^py.|x,,ff, (h) = &H(h), for ah i G V. 

2) Using the particular form of the distributions PY-\Xi,Hi 
{i G V) in ( |29] l and Py.|_f/. (« G P) in (|30] l, compute 

oc exp - 7 / dhnH^py.|x.,H. - /liX^p 



oc exp 1^ - 7(cr|f. + Imh.P; 



cx CN X 



for aWieV with (t|^ = [A^^];,, (j G P). 

3) Use the messages "tl^y \x h ->x (^i) (* ^ fixed 
input for the BP part and run BP. 

4) After running BP in the BP part, compute the messages 
njfi-j-py. 1^. (xi) {i G V) and update the messages in 
the MF part. Namely, after setting 

MX, = y^»Ji:i^py,|x...if.. 
for all i G P, compute the messages 



m 



MF 



PV; IXj .Jfj 

OC exp 



oc exp ^ - ^{(Tx^ + \fix, P) 



/i,: - 



Vif^X, 



oc CN 



I, 2 
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for all i e V, 

ViX* 1 



A. "Extrinsic" values versus "APP" 



oc CN hf, 



for all i e P, 



and 



dct(AH) f fu \Ha /u \ 

= ^M+N *^^P ( - (h - /^h) AH(h - ^h) 

= CN(h;/iH,AH') 

for all i <E T>. Here, we used Lemma [3] in Appendix |F] 
to get the updated parameters 



with 



Ah = Ah + Ah 



7K^ + |AixJ') ifi=jeV 
else 



(34) 



and 



^HiiA^Hi = 



A', if j e 2? 



^^UiX* if i eV. 
The update for the belief 6h is 

6H(h) = nH^p^^|_^^,„Jh) 

i.e., 6H(h) =CN(h;/XH,AH )■ 
5) Proceed as described in 2). 



In consideration of Remark 13.31 it is instructive to analyze 
the messages coming from the variable nodes Ibp H 2mf = 
{Xi, . . . ,Xjv}, which are contained in the BP and MF part 
of the factor graph depicted in Figure [T] Whether a message 
passing from a variable node to a factor node is an "extrinsic" 
value or an "APP" depends on whether the corresponding 
factor node is in the BP or the MF part. Thus, for all 
71 E [1 : N], the messages 



nx. 



|C(") 



(Xtj 



MF 



{xi„ 



which are passed into the BP part, are "extrinsic" values, 
whereas the messages 



fx.„|c(") 



MF 

PY^ |X; ,ff; 



which are passed into the MF part, are "APPs". Note that this 
result is aligned with the strategies proposed in |fT9l , 1201 . 
where "APPs" are used for channel estimation and "extrinsic 
values" for detection. 



B. Level of MF approximation 

Note that there is an ambiguity in the choice of variable 
nodes in the MF part. This ambiguity reflects the "level of 
the MF approximation" and results in a family of different 
algorithms. For example, instead of choosing H as a single 
random variable, we could have chosen Hi {i G [1 : M + N]) 
to be separate variable nodes in the factor graph. In this case 
we make the assumption that the random variables Hi [i G 
[1 : M + N]) are independent and the set of indices X in (1311 1 
has to be replaced by 



i = {x,\iev}\j {H, 



,cf)}u{[/i, 



,Uk}. 
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Since this is an additional approximation, the performance 
of the receiver is expected to decrease compared to the case 
where we choose H as a single random variable. However, it is 
possible that the complexity reduces by applying an additional 
MF approximation. See |23l for further discussions on this 
ambiguity for a class of MIMO-OFDM receivers. 

C. Comparison with BP combined with Gaussian approxima- 
tion 

The example makes evident how the complexity of the 
message passing algorithm can be simplified by exploiting 
the conjugate-exponential property of the MF part, which 
leads to simple update equations of the belief 5h- In fact, 
at each iteration in the algorithm we only have to update the 
parameters of a Gaussian distribution ( |34] |. In comparison let 
us consider an alternative split of A by moving the factor 
nodes PYi\Xi.Hi (* G T^) in < l29] l and Py.|h. {i E V) in dSOl l to 
the BP part. This is equivalent to applying BP to the whole 
factor graph in Figure [T] because m^^^-^ = ™ph^h- Doing 
so, each message ^ -^^{hi) {i 6 V) does no longer 

admit a closed form expression in terms of the mean and the 
variance of the random variable Xi and becomes a mixture 
of Gaussian pdfs with 2^ components; in consequence, each 
message n-n^p^ ^ (h) {i G V) becomes a sum of 2^(^^i) 
terms. To keep the complexity of computing these messages 
tractable one has to rely on additional approximations. 

As suggested in ||34l , ||35l , we can approximate each 
message ^ _>H(^i) (* G 2?) by a Gaussian pdf. BP 

combined with this approximation is comparable in terms 
of complexity to Algorithm |2] since the computations of 
the updates of the messages are equally complex. However, 
Algorithm |2] clearly outperforms this alternative, as can be 
seen in Figure |2] It can also be noticed that the performance 
of Algorithm 2 is close to the case with perfect channel state 
information (CSI) at the receiver, even with a low density of 
pilots, i.e., such that the spacing between any two consecutive 
pilots (Ap) approximately equals the coherence bandwidtI0 
(Wcoh) of the channel or twice of it. 

To circumvent the intractability of the BP-based receiver, 
one could also apply other approximate inference algorithms 
to the factor graph like, e.g., expectation propagation (EP). 
A comparison between EP and BP-MF can be found in 1361 , 
where it was shown that BP-MF yields the best performance- 
complexity tradeoff and does not suffer from numerical insta- 
bihty. 

TABLE I 

Parameters of the OFDM system. 



Number of subcaniers 


M + N = 300 


Number of evenly spaced pilots 


AI e {13,25} 


Modulation scheme for pilot symbols 


QPSK 


Modulation scheme for data symbols 


16 QAM (L = 4) 


Convolutional channel code 


R = 1/3 (133,171,165)8 


Multipath channel model 


3 GPP ETU 


Subcarrier spacing 


15 kHz 


Coherence bandwidth 


VFcoh fa 200 kHz 



'Calculated as the reciprocal of the maximum excess delay. 




2 4 6 8 10 12 14 

SNR [dB] 



Fig. 2. Bit error rate (BER) as a function of signal-to-noise ratio (SNR) 
for Algorithm |2] (BP-MF), BP combined with Gaussian approximation as 
described in Subsection IIV-CI and BP with perfect CSI at the receiver. Pilot 
spacing Ap W^a {M = 25) and Ap ?d 2W^oh {M = 13). 

D. Estimation of noise precision 

Algorithm |2] can be easily extended to the case where the 
noise precision 7 is a realization of a random variable F. In 
fact, since \^PYi\Xi.Hi,r G T^) and lnpy.|//. p (* G V) 
are linear in 7, we can replace any dependence on 7 in the 
existing messages in Algorithm |2] by the expected value of F 
and get simple expressions for the additional messages using 
a Gamma prior distribution for F, reflecting the powerfulness 
of exploiting the conjugate-exponential model property in the 
MF part for parameter estimation. See 1231 for further details 
on the explicit form of the additional messages. 

V. Conclusion and Outlook 

We showed that the message passing fixed-point equations 
of a combination of BP and the MF approximation correspond 
to stationary points of one single constrained region-based 
free energy approximation. These stationary points are in one- 
to-one correspondence to solutions of a coupled system of 
message passing fixed-point equations. For an arbitrary factor 
graph and a choice of a splitting of the factor nodes into a 
set of MF and BP factor nodes, our result gives immediately 
the corresponding message passing fixed-point equations and 
yields an interpretation of the computed beliefs as stationary 
points. Moreover, we presented an algorithm for updating the 
messages that is guaranteed to converge provided that the fac- 
tor graph fulfills certain technical conditions. We also showed 
how to extend the MF part in the factor graph to continuous 
random variables and to include hard constraints in the BP 
part of the factor graph. Finally, we illustrated the computation 
of the messages of our algorithm in a simple example. This 
example demonstrates the efficiency of the combined scheme 
in models in which BP messages are computationally in- 
tractable. The proposed algorithm performs significantly better 
than the commonly used approach of using BP combined 
with a Gaussian approximation of computationally demanding 
messages. 
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An interesting extension of our result would be to generalize 
the BP part to contain also continuous random variables. 
The results in [[37l provide a promising approach. Indeed, 
they could be used to generalize the Lagrange multiplier 
for the marginalization constraints to the continuous case. 
However, these methods are based on the assumption that 
the objective function is Frechet differentiable |f38l p. 172]. 
In general a region-base free energy approximation is neither 
Frechet differentiable nor Gateaux differentiable, at least not 
without any modification of the definitions used in standard 
text books |[38l pp. 171-172^3- An extension to continuous 
random variables in the BP part would allow to apply a 
combination of BP with the MF approximation, e.g., for 
sensor self-localization, where both methods are used 1391 . 
1401 . Another interesting extension could be to generalize the 
region-based free energy approximation such that the messages 
in the BP part are equivalent to the messages passed in tree 
reweighted BP or to include second order correction terms in 
the MF approximation that are similar to the Onsager reaction 
term fSOl. 
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Appendix 

A. Proof of Lemma Q] 

Suppose that {ma^i{xi),ni^a[xi)} (a e ^, i e A/'(a)) is 
a solution of (|7]i and set 



'la^tixi) = KaAma^t{xi), for all a G ^, i G 7V(a) 
li^aixi) = Ta^iTii^aixi), for all a £ ^, i G N{a) 



(35) 



with Ka.i,Ta,i > (a G ^, i G M{a)). Plugging into Q 
we obtain the following fixed-point equations for the messages 

{ma^i{xi), rii^aixi)] (a G A « e M{a)). 

ieA^(a)\i x„\a;i j£^/{a)\i 

c£j\f{i)\a ceJ\f{i)\a 

(36) 

for all a G i G J\f{a). Now (|36] | is equivalent to ^ if and 
only if 

Ta,! = Y\. all a G -^7* G A/'(a) (37) 

c£M{i)\a 



> n ' 

jeAfia)\i 



for all a G ^, i G N{a) (38) 



'"For a positive real-valued function 6, 6 + A6 might fail to be a positive 
real-valued function for arbitrary perturbations Ab with sufficiently small 
norm ||Afe||. 



where the positive constants Za {a G A) are such that the 
beliefs ha [a G ^) in (|5]) are normalized to one. This 
normalization of the beliefs ha {a G A) in ^ gives 

E/a(Xa) n nj^aiXj) 



n 



: n 

jeJ\f(a) 



-, for all a G ^ 



(39) 



where we used ( |35] ) in the second step and (O in the last step. 
Combining (|37] |. ( |38] |. and (3% we obtain 



1 ^aA'^a-. 



9t 



, for all a G ^, i G Af{a) 



with 



Now suppose that (|9]) is fulfilled. Setting 



imjTi 



|A/<i)l 



for all a G ^, i G A/'(a) 
for ail a e A,i e M{a) 



and reversing all the steps finishes the proof. 

B. Extension of the MF approximation to continuous random 
variables 

Suppose that px is a pdf of the vector of random variables 
X. In this appendix, we assume that all integrals in the region- 
based free energy approximation are Lebesgue integrals and 
have finite values, which can be verified by inspection of 
the factors fa (a G A) and the analytic expressions of the 
computed beliefs hi (i G I). An example where the MF 
approximation is applied to continuous random variables and 
combined with BP is discussed in Section |IV] 

For each i G X we can rewrite Fmf in ( fTTT i as 



Fmf = -D(foi II flj) + ^ hj{xj)lnhj{xj)dxj 



aeA\J\f{i) 



with 



aj(2;j) = cxp ^ j\nfa{y.a) bj{xj)dxj^, 



for all i G I. 



It follows from f22l Th. 2.1] that D{hi\\ai) is minimized 
subject to J hi{xi) Axi = 1 if and only if 

ai{x,) 



bi{xi) 



J a.i{xi)dxi 



(40) 
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up to sets of Lebesgue measure zero. Formally, bi in (|40] i 
differs from bi in ( fT2] i by replacing sums with Lebesgue 
integrals. 

C. Proof of Theorem |2] 

The proof of Theorem |2] is based on the ideas of the proof 
of ||9] Th. 2]. However, we will see that we get a significant 
simplification by augmenting it with some of the arguments 
originally used in ifTTl for Markov random fields and adopted 
to factor graphs in lfT2l . In particular, we shall make use of 
the following observation. Recall the expression for Fbp mf in 
(EJ 

^BP.MF = ^ y^ba(Xa)ln 

- E E n &.(^01n/a(Xa) 

-Y,{WMi)\-l)Y,h{x,)\nh{x,) (41) 

the marginalization constraints 

h{xi) = ^ 6a (xa), for all a e Ap, « e A/'(a) (42) 

and the normalization constraints 

6i(a;j) = 1, for all i E Imp \ ^bp 



6a (xa) = 1, for all ae Ab 



(43) 



Using the marginalization constraints (|42] |. we see that 

^ ^6a(Xa) In n ^^C^O 
= E E E 6a(Xa) In 6, (Xj) 

= ^ y^ y^6j(x-i)ln6j(a;i) 

= E E E^»(^*)^'^^'(^*) 

ieXBP aGA^Bp(») 

= ^ |AABp(z)|^6,(x,)ln6,(x,). (44) 
ieiBP Xi 
Combining (l44l i with (l4Tl i. we further get 

^BPMF = ^ E E ^a(Xa) In /a(Xa) 

ae^BP Xa 

- E E n ^.(^0 In /a(Xa) 
ae^MF Xa i£M{a) 

+ ^^bi{xi) In 6,; (xi ) 

iei Xi 

+ E (45) 

aG^BP 

with the mutual information ||25] p. 19] 

/a = 6a (Xa) In ^a(Xa) ^ ^ ^ ^ ^^^^ 



Next, we shall compute the stationary points of the Lagrangian 

^BP.MF =-FbP,MF 

"EE E^''''(^*)(^»(^*) " E 

oe^BP iGAA(a) a;i Xa\i:i 

- E 7.(E^'(^^)-^ 



aG.ABP x„ 



(46) 



using the expression for Fbpmf in ( 1451 ). The particular form 
of ^BP.MF in ( |45] ) is convenient because the marginalization 
constraints in (l42l i imply that for all i € I and a G ^bp we 
— l7VBp(i)(a). Setting the derivative of Xbp.mf 



have 



dia 



db^(x,) 



in ( |46] | with respect to bi{xi) and 6a (xa) equal to zero for all 
i G I and a G -4bp, we get the following fixed-point equations 
for the stationary points: 



ln6i(xj)= y^ \as{^i) 

+ E E n ^^.(2;,)ln/a(Xa) 

aGA^MF(i) Xa\a;i jeN'{a)\i 

+ \J^Bp{i) \ + Ii„f\Ibp(«)7» - 1, for alH G I 
ln6a(xa) =ln/a(xa) - ^ Aa,j ) + In Yl h{Xi] 



+ 7a - 1, for all a G ^bp- 



Setting 

ml^-^,{x,) = CXp (Aa,,;(x,) + 1 - p^y^^^) > 

for all a G ^bp, i e A/'(a) 
m^^, {x, ) = expi Yl h i^i ) In /a (xa ) j . 

\ yia\xi jeN'(a)\i / 

for all a G -4mf, i £ M{a) 
we can rewrite (|47| i as 



b^{x,)=Z, n ™a^.(^0 n "^a'.(^0, 
a£A/'Bp(j) aGAiMF(i) 

for a\\ i eX 



(47) 



(48) 



6a(Xa) = Za /a(Xa) J]^ 

for all a G ^BP 



bt{xi) 



(49) 



where 



= exp(Ix„p\iBp(«)70: forallzGT 



ieA/'(a) 



KpI*)! 



for all a G ^bp 
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are such that the normalization constraints in ( |43] l are fulfilled. 
Finally, we define 

c6A''Bp(i)\{a} ceMyBi{i) 

(50) 

for all a G ^, i G Af{a). Plugging the expression for ni^a{xi) 
in (|50] l into the expression for 6a (xq) in (|49T l. we find that 

for alH G X 

6a(Xa) = Za /a(Xa) J_ J_ ni^a{xi), 

for all a G v4bp- 

Using the marginalization constraints in (|42] | in combination 
with (ISTT i and noting that Zi = 1 for all i G 2bp we further 
find that 

= hi{Xi) 

= Za ^ /a(Xa) J]^ "^^''(^j) 



and set 



ieA/'(a) 



(52) 

for all a G ^bp,« G N{a). Dividing both sides of (|52] i by 
nj^a(a;i) gives 

"la'^ilaJi) = Za ^ /a(Xa) J]^ U-j^aixj) (53) 
Xa\2:i jeN'{a)\i 

for all a G ^bp,* G Af{a). Noting that nj^a{xj) ~ bj{xj) 
for all a G ^mf and j G Af{a), we can write the messages 
m'^^,{x^) in dSJ as 



m^!,,;(xO=exp( n 



nj_>Q(Xj)ln/a(Xa) 



\a:o'6AA(a)\i / 

(54) 

for all a G ^^mf,* G Af{a). Now (|50ll, (|53ll, and (|54] | 
are equivalent to (|22] | and (ISTT l is equivalent to ( I2TI 1. This 
completes the proof that stationary points of the Lagrangian in 
(|20] i must be fixed-points with positive beliefs fulfilling d^TT i. 
Since all the steps are reversible, this also completes the proof 
of Theorem |C] 

D. Proof of Lemma |2] 

We rewrite J^bp.mf in ( flTl l as i^BP.MF = Fi+ F2+ F3, with 



< fca = ^ /a(xa), for all a G ^. 



Then 



Fi=Y. D{ba\\fa/ka)~ ^ In fc, 

> - ^ In(fca) 

ae^BP 

> — 00 

^2 = E ^( n II ^-/^") - E 

> - ^ lllfca 

ae-4MF 

> — 00 

F3 > 0. 

This proves 1). Now F3 < 00, ( |24] | implies that F2 < 00, and 
(|23] | implies that i^i < cxd if and only if dZSl l if fulfilled, which 
proves 2). 

Suppose that we have fixed all fea(xa) (a G ^bp) from ( |25] l. 
Then the analysis for the remaining hi{xi) {i G I) and 6a (xq) 
excluding all Xa from ( l25T l (a G ^bp) is the same as in the 
proof of Theorem |2] and the resulting fixed-point equations are 
identical to i2l[ and ( |22] | excluding all Xa from ( |25] l and vice 
versa, which proves 3). We can reintroduce the realizations Xa 
with /a(xa) — {a G Abp) from ( |25] | in ( |22] | because they 
do not contribute to the message passing update equations, as 
can be seen immediately from the definition of the messages 
m^^^{xi) {a G -4bp, j G ■Af{a)) in (l22l i. The same argument 
implies that (|25l l is a special case of the first equation in ( 1211 1. 
which proves 4) and, therefore, finishes the proof of Lemma 

m 



E. Proof of convergence 

In order to finish the proof of convergence for the algorithm 
presented in Subsection Illl-BI we need to show that running 
the forward/backward algorithm in the BP part in step 2) 
of Algorithm [T] cannot increase the region-based free energy 
approximation Fbpmf in iVT\ . To this end we analyze the 
factorization 



F,^ Yl II /«) 

aeA-Bp 

F.^ E ^( n b.\\fa) 
aeAuF ieN'ia) 

F3 = -E(I-^BpWI + I-^MfWI - l)Yb^ix^)^^b,{x,) 



P(xbp)0C n /a(Xa) [] 11 ^^-^^(^0 (55) 

og^Abp ieiBpniMF beA/MF(i) 

with xbp = I i G Tbp)^- The factorization in i55[ is the 
product of the factorization of the BP part in ( fTSl l and the 
incoming messages from the MF part. The Bethe free energy 
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([3) corresponding to the factorization in d55] l is 

&a(Xa) 



/a(Xa) 



ieXspnlMF agA4lF(i) 

- 51 (IAAbp(z)I + |AAmf(*)| - l)^6,(a;,)ln6,(x,) 



-EE E^'(^')i^"^^'^(^') 

ieiBpniiuF oeA4iF(i) 
- E (l-^BP«| - l)Y,h{x,)\nh{x,). (56) 

iGlsp 

We now show that minimizing F^p in ( |56] ) is equivalent to 
minimizing -Fbp.mf in (fTTT i with respect to ba and fe^ for all 
a G v4bp and i S Xbp- Obvioulsy, 



dbi{xi) db.,{xi) 



, for all i e Ibp \ l-i 



MF 



and 



9Fb 



for all a G ^bp- 



dba{yia) dba{yia) 
This follows from the fact that Fbp.mf differs from Fbp by 
terms that depend only on 6^ with i G 2mf- Now suppose that 
i € 2bp n Xmf- In this case, we find that 

9i^BP, MF 



dbi{xi) 



(1- |AABp(z)|)(lnfa,(xO + l) 
- E E n b,{x,)\nfM (57) 



and 

dFBP 
dbi{xi) 

From (l22b we see that 



= (1 - |A/'bpW|)(1ii6,(xO + 1) lnm^P,;(x,). 

aeA^MF(i) 

(58) 



\ Xa\a:i jeJV{a)\i 



(59) 

for all a G A/'mf(*)- Note that, according to step 2) in 
Algorithm [T] the messages m^^j(xi) in ( |59] l are fixed inputs 
for the BP part. Therefore, we are not allowed to plug the 
expressions for the messages m^^^{xi) in (|59] l into (ISST i in 
general. However, since a G „4mf and i G Ibp HImf, condition 
impUes that J\f{a) \ « C 2mp \ Ibp and guarantees that 



Tlj^d (^Xj ) — bj (^Xj ) 

is constant in step 2) of Algorithm [T] for all j G 7V(a) \ 
i ^ 2mf \ ^BP- Therefore, we are indeed allowed to plug the 
expressions of the messages mJ^^^(xi) in (|59] | into (l58T l and 
finally see that also 

9-FbP, MF <9Fbp 



dbi{xi) dbi{xi) 



, for all i G Xfip H 1} 



MF- 



Hence, minimizing i^BP in ( l56] l is equivalent to minimizing 
^BPMF in (HH). 

By assumption, the factor graph in the BP part has a tree 
structure. Therefore, ||9] Prop. 3] implies that 

1) i^BP>0; 

2) i^BP = if and only if the beliefs {bi,ba} in ( |56] | are 
the marginals of the factorization in ( fSSl ). 

Hence, for bj fixed with G Imf \ 2bp, we see that Fbp,mf 
in ( fTTI i is minimized by the marginals of the factorization in 
(|55}. 

It remains to show that running the forward/backword 
algorithm in the BP part as described in step 2) in Algorithm 
[T] indeed computes the marginals of the factorization in dSSl ). 
Applying Theorem [T] to the factorization in i55[ yields the 
message passing fixed-point equations 



.(.^0 n 



^j—ia {Xj ) 

Xa\xi jeN'{a)\i 



(60) 



for all a G ABP,i G A/'(a). The message passing fixed- 
point equations in ( l60l l are the same as the message passing 
fixed-point equations for the BP part in ( l22l i with fixed-input 
messages mJ^^^{xi) for all i G Ibp H Imf and a G A/mfI*)- 
Hence, running the forward/backward algorithm in the BP part 
indeed computes the marginals of the factorization in (ISST i and 
Algorithm [T] is guaranteed to converge. 

F. Product of Gaussian distributions 
Lemma 3: Let 

|j,(x) = CN(x; /x„ A^i), for all i e [1 : N]. 



Then 



with 



H k(x) cxCN(x;/x,A-i) 

ie[l:JV] 

je[l:JV] 

A= E 

ie[l:JV] 

Proof: Follows from direct computation. 
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